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Abstract 

Despite  significant  proliferation  of  Internet  services  in 
recent  years,  technology  for  computer-supported  coopera¬ 
tive  work  and  groupware  have  not  progressed  at  the  same 
rate.  A  wider  distribution  of  the  work  force  motivates  the 
need  for  networked  multimedia  and  groupware  at  Internet 
scope  and  for  larger  groups  of  end-users.  In  particular, 
synchronous  telecollaboration  enables  people  in  different 
geographic  locations  to  bridge  time  and  space  by  sharing 
and  jointly  manipulating  multimedia  information  in  real¬ 
time  and  at  various  levels  of  granularity.  This  aspect  stands 
in  contrast  to  legacy  client-server  applications  such  as  In¬ 
ternet  radio  broadcast  or  video-on-demand,  and  to  asyn¬ 
chronous,  document-centric  collaboration  tools  like  email, 
instant  messaging,  or  chat  rooms.  In  this  paper,  we  provide 
a  framework  for  network-supported  synchronous  multime¬ 
dia  groupwork  at  Internet  scope  and  for  large  user  groups. 
Contributions  entail  an  novel  classification  for  such  sys¬ 
tems  concerning  scale  and  scope  of  interaction,  a  formal 
framework  for  Internet  sessions  and  mediation  of  access  to 
concurrently  shared  resources,  a  taxonomy  of  crucial  el¬ 
ements  in  cooperative  applications,  and  a  discussion  of  a 
generic  network  coordination  protocol  to  sustain  live  inter¬ 
action  among  concurrently  active  user  groups.  The  core 
ideas  put  forward  in  this  paper  are  useful  for  the  charac¬ 
terization  and  rapid  prototyping  of  a  new  generation  of  col¬ 
laborative  applications. 
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1  INTRODUCTION 

In  contrast  to  stand-alone  applications,  where  the  user 
interacts  only  with  a  computer  system,  engineering  of 
telecollaborative  systems  is  much  more  complex  because 
it  involves  user,  network,  and  host-related  issues,  such  as 
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human  factors,  Quality-of-Service,  and  heterogeneous  plat¬ 
forms  for  applications.  The  end-to-end  interaction  mani¬ 
fests  itself  between  users,  not  end  hosts,  and  users  expect 
ideally  a  telecollaboration  environment  providing  a  quality 
of  interaction  close  to  a  face-to-face  meeting.  Limitations  in 
the  availability  and  accessibility  of  resources  in  the  shared 
workspace  of  a  telecollaborative  system  create  contention, 
competition,  and  conflict  among  users  and  make  it  neces¬ 
sary  to  deploy  coordination  mechanisms  to  reach  consensus 
on  how  to  jointly  and  effectively  use  the  resources.  Con¬ 
flicts  stalling  the  workflow  may  occur  before  and  during 
resource  allocation  to  users,  as  well  as  during  actual  usage. 
Telecollaborative  services  build  on  the  provision  of  group 
coordination  mechanisms.  These  manage  access,  manip¬ 
ulation,  distribution  and  presentation  issues  between  users 
and  shared  resources.  Such  coordination  mechanisms  are 
necessary  to  allow  users  to  achieve  individual  goals  in  the 
context  of  group-centered  remote  interaction,  when  telep¬ 
resence  [3]  substitutes  for  physical  presence.  Cerf  etal.  [6] 
pointed  out  the  importance  of  transatlantic  collaboration  in¬ 
frastructures  in  a  memorandum  in  1991. 

Software  to  support  collaborative  work,  generally 
termed  groupware  [11],  or  workgroup  computing  soft¬ 
ware,  referred  initially  only  to  systems  supporting  the  asyn¬ 
chronous  exchange  of  text-documents,  but  more  recent  con¬ 
notations  include  multimedia-based,  synchronous  interac¬ 
tion.  We  focus  on  group  coordination  protocols,  which 
embrace  multicasting  and  consider  network  conditions  in 
the  coordination  processes  between  hosts,  complementing 
efforts  on  group  membership  known  from  distributed  sys¬ 
tems  and  multicasting  as  an  efficient  message  dissemination 
mechanism  for  group  communication.  This  extended  ab¬ 
stract  presents  an  outline  of  our  work  on  a  group  coordina¬ 
tion  architecture,  with  focus  on  the  formal  modeling  of  the 
key  elements  and  protocols.  Section  2  discusses  the  group 
coordination  framework  and  main  architectural  aspects  and 
outlines  a  generic  coordination  protocol.  Section  3  con¬ 
cludes  the  paper. 

2  COORDINATION  FRAMEWORK 

We  present  a  formal  view  on  entities  and  actions  in 
coordination-centric  systems,  refining  earlier  efforts  [24, 
25]  on  the  definition  of  coordination  and  control  processes 
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in  collaborative  multimedia  systems.  Candan  et  al.  [5]  fo¬ 
cus  on  algorithms  for  collaborative  composition  and  trans¬ 
mission  of  media  objects  under  given  quality  constraints, 
and  their  presentation  in  collaborative  group-sessions.  We 
picture  a  computer  network  as  a  graph  with  nodes  (sta¬ 
tions,  hosts)  V  sending  messages  across  links  (channels) 
E  C  V  X  V.  A  connection  is  a  unidirectional  or  bidi¬ 
rectional  transmission  link  from  a  sender  node  to  a  set  of 
receiver  nodes. 

Definition  1  A  collaboration  environment  T  in  a  computer 
network  is  a  tuple 


r  =<  S,U,TZ,E  > 

where  S  =  (V,  E)  is  a  set  of  sessions  E,  lA  is  a  set  of  users 
(hosts,  processes,  agents,  participants),  TZ  is  a  set  of  shared 
resources  (media),  and  T  is  a  set  of  floors  controlling  the 
resources. 

2.1  Entities 
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2.1.1  Sessions 

A  session  provides  the  infrastructure  for  cooperation  and 
collaboration. 

Definition  2  A  session  T,  £  S  is  a  tuple 

E  =<  Sid,Ti,Te,As,L  > 

where  Sid  is  a  unique  identifier  within  F,  Tj  is  the  initiation 
or  announcement  time,  Tg  is  the  ending  time,  and  is  a 
list  of  attributes  characterizing  the  session  at  level  L.  A 
conference  is  a  set  of  sessions  Ej  €  S,  where  i  >  1. 

Sid  is  a  unique  session  identifier  per  collaborative  envi¬ 
ronment,  whose  sequence  number  space  is  wrapped  around 
in  correlation  with  the  turnover  rate  and  lifetime  of  sessions 
in  r.  The  time  may  reflect  real-time,  logical  time,  or  define 
a  lifetime  interval  A  =  Te  —  T,.  L  denotes  the  session  level 
(default  0). 

As  =  (M,  O,  C)  describes  purpose  and  orchestration  of 
a  session  in  terms  of  membership  M,  organization  O,  and 
control  C,  as  shown  in  Figure  1.  Szyperski  [31]  character¬ 
izes  session  types  in  a  similar,  but  less  refined  way,  accord¬ 
ing  to  the  model  of  interaction  (controlled,  dynamic,  static) 
and  data  flow  (1-n,  n-1,  m-n).  For  instance,  a  lecture  is  a 
controlled,  long-term  interaction  between  one  sender  and  n 
receivers.  A  typical  n  —  1  session  is  telemetry,  and  a  white¬ 
board  session  is  typically  m  —  n.  Our  session  characteri¬ 
zation  applies  to  specific  collaborative  applications,  as  well 
as  generic  session  types  in  the  spectrum  of  real-time  col¬ 
laborative  work,  such  as  lectures,  business  meetings,  labs, 
panels,  brainstorm  meetings,  exams,  interviews,  or  chats. 

Membership  reflects  the  composure  of  the  user  group 
in  the  session.  Participation  specifies  whether  informa¬ 
tion  is  exchanged  unilaterally,  or  bilaterally  relative  to  a 
host,  impacting  user  access  rights  and  data-flow.  Inter¬ 
active  sessions  may  be  symmetric,  i.e.,  all  users  have  the 
same  view  on  shared  resources  (WYSIWIS),  or  asymmet¬ 
ric,  where  users  pertain  individual  views  on  the  same  shared 
data  space  (relaxed  WYSIWIS)  [29].  Size  specifies  a  small 
(<  5),  medium  (<  100),  or  large  (>  100)  number  of  users, 
impacting  scalability  of  the  coordination  mechanism.  Ac¬ 
cessibility  declares  whether  a  session  is  open,  allowing  any 


Figure  1.  Session  attributes. 


user  to  join,  whereas  closed  sessions  allow  participation  by 
invitation  only.  Authorization  specifies  whether  coordina¬ 
tion  primitives  may  use  read-only,  read-write,  or  write-only 
privileges  for  the  entire  session.  Users  may  have  individual, 
role-based  authorizations,  as  well. 

Organization  entails  specifics  on  how  the  session  is  to 
be  orchestrated.  Dataflow  describes  how  data  are  mul¬ 
tiplexed  among  users,  with  a  1-1,  1-n,  or  1-m  transmis¬ 
sion  model  and  with  unicast,  broadcast,  or  multicast  in  a 
session  of  n  users,  where  m  <  n.  Delivery  can  be  or¬ 
dered  or  unordered,  it  Duration  discerns  between  sessions 
with  longer  lifetime  (persistent)  vs.  short-term  sessions, 
where  the  precise  timing  modalities  are  case-specific  and 
left  open.  Scope  specifies  the  hop  limit  for  packets  sent  by 
hosts  in  a  particular  session,  similar  to  the  Time-To-Live 
semantics  in  IP,  which  allows  to  constrain  sessions  to  a  ge¬ 
ographic  range  and  retain  privacy  or  limited  dissemination 
to  a  specific  group.  Media  composition  defines  whether  the 
session  uses  a  single  medium  such  as  audio-only,  or  mixed 
media,  e.g.,  a  video-audio  combination.  Conduction  refers 
to  the  session  agenda  and  moderation  style,  which  can  be 
either  tightly  coupled,  i.e.,  all  users  know  about  each  other 
and  follow  some  agenda  in  the  style  of  “Robert’s  Rules  of 
Order”  [26],  or  the  exchange  is  loosely-coupled  and  not  pre¬ 
scribed.  Sessions  can  be  flat  (L  =  1)  or  maintain  two  or 
more  levels  with  nested  groups  (L  >  1). 

Control  depicts  the  status,  locus  of  control,  and  secu¬ 
rity  measures  activated  for  a  session.  Sessions  with  over¬ 
lapping  or  diverging  interests  can  merge  or  split.  Such  re¬ 
configuration  of  sessions  with  regard  to  membership  and 
session  events  linked  to  specific  phases  must  be  possible 
without  session  termination  or  restart  of  applications.  The 
session  status  marks,  whether  the  session  is  a  partition  from 
a  larger  session,  frozen  but  still  deemed  as  active,  merged 
or  revived.  Tracking  of  states  in  coordination  protocols  and 
the  outcome  of  coordination  processes  can  be  logged  and 
persistent,  or  ephemeral. 

Locus  of  control  specifies,  whether  membership  and 
floor  control  are  being  handled  in  one  central  location. 


partially  distributed  among  several  servers,  or  fully  dis¬ 
tributed  across  all  hosts.  Partial  or  full  replication  is  pos¬ 
sible  for  the  latter  two  paradigms.  A  central  controller 
can  also  rove  among  all  sites  and  achieve  better  fault  tol¬ 
erance.  Distributed  control  is  multilateral,  with  varying  de¬ 
grees  of  “consentience”  and  “equipollence”,  i.e.,  how  much 
everybody  participates  and  how  authorities  and  responsibil¬ 
ities  are  allocated.  Multilateral  control  is  either  successive, 
partitioned,  democratic  or  anarchic.  Successive  controller- 
ship  allows  one  distinct  controller  at  a  time,  and  alternates 
among  users,  and  partitioned  control  lets  several  controllers 
each  perform  a  subset  of  control  operations.  Democratic 
control  lets  all  users  contribute  to  the  control  process,  e.g., 
via  voting.  Anarchic  control  gives  all  subjects  complete 
freedom  of  acting  and  control  of  sharing  is  performed  peer- 
to-peer  based. 

The  locus  of  control  is  related  to  the  supervision  at¬ 
tribute,  indicating  whether  the  communication  process  in 
coordination  is  moderated,  peer-reviewed,  or  free.  A  mod¬ 
erator  decides  which  users  may  send  information,  what  is 
forwarded  to  the  receivers,  or  which  receivers  may  receive  a 
particular  content  or  access  a  specific  resource,  implement¬ 
ing  a  notion  of  floor  control.  McKinlay  et  al.  [20]  note  for 
face-to-face  meetings  that  the  importance  of  chaired  guid¬ 
ance  increases  with  the  session  size,  and  the  difficulty  in 
performing  a  joint  task,  since  each  member’s  ability  to  par¬ 
ticipate  and  influence  others  is  reduced.  Finally,  coordina¬ 
tion  touches  upon  security  issues,  specifying  whether  users 
are  anonymous  or  authenticated  in  their  exchanges,  either  at 
session  initiation,  or  at  every  turn,  and  whether  information 
is  encrypted. 


ticast  group.  Neilsen  and  Mizuno  describe  a  membership 
algorithm  for  joining  and  leaving  coteries  [21],  and  Tex- 
ier  and  Plouzeau  [32]  propose  object  binding  algorithms 
for  multiple  sessions,  however,  to  date  a  sound  mechanism 
for  session  management  in  multimedia  collaboration  is  still 
missing.  Concurrent  sessions,  as  opposed  to  sequential  ses¬ 
sions,  allow  users  to  participate  in  multiple  sessions  simul¬ 
taneously.  Hierarchical  sessions  permit  inheritance  of  at¬ 
tributes  from  parent  to  child  sessions,  and  aggregation  of 
sibling  sessions  under  a  parent  session.  Vin  et  al.  [35]  de¬ 
scribe  such  a  hierarchical  architecture  for  media  mixing,  as 
required  in  a  telepresentation  system  of  teleorchestra,  and 
derives  upper  bounds  for  the  media  transmission  capacity 
and  the  height  of  a  hierarchy,  given  a  number  of  partici¬ 
pants  and  mixers,  with  one  speaker  being  active  at  a  time. 

2.1.3  Users 

Users  in  the  user  set  U  from  the  specification  of  a  collabora¬ 
tive  environment,  also  referred  to  as  participants,  subjects, 
or  session  members,  are  equated  with  hosts  or  their  pro¬ 
cesses  in  protocol  descriptions. 

Definition  3  A  user  U  £lJ  is  a  tuple 

U  =<  Uid,  Sid,  Loc,Tj,Ti,  Au  > 

where  Uid  is  a  unique  identifier  within  the  session  Sid, 
Loc  is  the  local  or  remote  location,  given  as  IP-address 
or  unique  host  identifier,  Tj  is  the  joining  time,  Ti  is  the 
leaving  time,  and  Ajj  is  a  list  of  user  attributes. 


2.1.2  Hierarchical  Sessions 

Rajan  et  al.  [24]  identify  a  confluence  as  a  special  session 
type,  where  all  participants  transmit  and  receive  the  same 
set  of  media  streams  mixed  together  in  broadcast,  which 
allows  to  save  bandwidth.  The  notion  of  confluences  and 
session  nesting  leads  to  the  concept  of  multilevel  or  hier¬ 
archical  sessions.  Session  hierarchies  permit  aggregation 
of  users  at  various  levels  of  abstraction,  reflecting  inter¬ 
ests,  the  stage  of  task  completion,  authorizations,  temporary 
subgroups  (coteries),  or  geographic  proximity,  and  reflects 
the  inherently  hierarchical  group  dynamics  of  face-to-face 
meetings  better.  The  hierarchy  is  denoted  with  the  session 
level  parameter  L,  which  indicates  numerically  the  position 
of  Sid  in  a  session  hierarchy.  For  instance,  in  a  3-level  hi¬ 
erarchy,  a  collaboration  or  master  session  has  level  0,  a  ses¬ 
sion  level  has  level  1,  and  a  subsession  is  at  level  2,  which 
may  be  sufficient  to  characterize  most  collaboration  scenar¬ 
ios. 

Rangan  and  Vin  [25,  34]  give  formal  definitions  for 
collaborative  systems  including  conference,  session  and 
stream  abstractions  for  the  purpose  of  automated  reasoning 
about  the  properties  of  multimedia  collaborations.  Adopt¬ 
ing  their  definitions  to  the  session  context,  we  distinguish 
between  simple  sessions  containing  individual  users,  and 
super  sessions,  recursively  consisting  of  other  sessions 
Si,L  G  Si  and  individuals,  with  L  indicating  the  level  of 
membership.  We  denote  the  outmost  “root”  session  as 
level-0  session.  Many  conference  scenarios  contain  only 
two  sublevels,  subsessions  with  L  =  1  and  coteries  with 
L  =  2.  Coteries  permit  private  subgrouping  for  brief  ex¬ 
changes  (“sidechats”)  [36]  without  requiring  its  members 
to  leave  the  larger  group  context  or  open  a  separate  mul¬ 


Processes  can  be  system  agents  [10,  19]  executing  on 
behalf  of  a  user.  The  user  attributes  Ajj  are  depicted  in 
Figure  2. 
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Figure  2.  User  attributes. 

Accordingly,  users  are  characterized  by  their  roles,  au¬ 
thority,  identity,  entry  capabilities  and  access  rights,  which 
decide  the  floor  control  strategy  applicable.  Users  can  be 
co-located  in  the  same  space,  or  geographically  distributed. 
We  distinguish  between  social  and  system  roles.  Social 
roles  describe  the  function  of  a  user  within  a  session,  e.g., 
being  a  panelist  or  lecturer.  System  roles  refer  to  the  control 
function  within  a  floor  control  protocol;  participants  with¬ 
out  specific  role  can  be  receiver  or  inactive;  the  owner  of 
a  resource  r  €  i?  is  the  node  that  injects  r  into  a  session 
and  initiates  floor  control  for  r,  which  may  vanish  from  a 


session  if  the  owner  leaves;  the  floor  coordinator  {FC)  is 
an  arbiter  over  a  resource  r,  or  a  session  moderator  grant¬ 
ing  or  denying  a  floor  on  r  during  session  time  to  the  floor 
holder  (FH),  who  attains  the  exclusive  right  to  work  on  r 
for  a  floor  holding  period.  FC  and  FH  may  be  located  at 
different  nodes,  or  be  assumed  by  the  same  node.  These 
roles  may  be  statically  assigned  at  session  start,  or  rove 
among  users  during  session  conduction.  Users  without  con¬ 
trol  roles  are  general  session  members,  and  can  be  active  or 
inactive,  depending  on  whether  they  invoke  state  transitions 
in  the  coordination  mechanism. 

A  moderator  is  a  special  FC  case,  where  the  coordina¬ 
tor  role  is  assigned  to  a  user  to  supervise  content  exchange, 
resource  usage,  and  membership  for  a  specific  section  of 
the  full  lifetime  of  a  session.  A  moderator-driven  session, 
mediated  through  a  specific  host,  results  in  a  centralized 
coordination  scheme,  even  though  the  host  topology  may 
be  decentralized,  with  the  known  shortcomings  in  regard 
to  efficiency  and  resiliency.  Moderators  may  be  selected, 
because  they  start  a  session  or  are  chosen  by  session  mem¬ 
bers  in  advance,  or  they  may  be  elected  [13,  28]  at  session 
runtime.  Tijdeman  [33]  discusses  a  solution  for  the  chair¬ 
man  assignment  problem  such  that  at  any  time  the  accu¬ 
mulated  number  of  chairmen  from  each  state  (or  session) 
is  proportional  to  its  relative  weight.  Role-based  floor  con¬ 
trol  in  dynamic  sessions  contrasts  static  role-based  access 
control  (RBAC)  models  [27].  Roles  can  be  inherited  from 
a  supersession  to  a  subsession. 

Authority  defines,  whether  the  user  is  a  simple  partici¬ 
pant,  privileged  as  system  root  user,  or  moderator,  linking 
this  field  with  the  role  entries.  A  moderator  can  be  per¬ 
manent  FC.  As  social  role,  the  moderator  equates  to  a 
session  supervisor  being  able  to  inspect  all  session  turns 
between  users.  Identity  specifies  whether  the  user  wants 
to  remain  anonymous  or  whether  the  Uid  can  be  posted  to 
the  session.  An  Entry  is  either  independent,  i.e.,  unaware 
of  the  actions  and  entries  of  others,  reflective,  i.e.,  polling 
session  members,  consultative  based  on  “contextual  clue 
messages”,  partitioned  and  representing  a  subtask,  based  on 
voting  among  the  group,  or  debriefed  and  recorded  [11].  In 
addition,  user  entries  may  be  temporary  or  permanent,  and 
logged  for  the  purpose  of  reviewing  histories  of  collabora¬ 
tive  sessions,  or  undoing  certain  steps  [23].  Access  defines 
the  basic  privileges  to  work  on  a  resource,  in  receive-only, 
send-and-receive,  and  send-only  mode,  in  analogy  to  read 
and  write  authorizations  in  file  systems.  We  introduce  the 
notion  of  a  group  to  describe  associations  of  users  within 
sessions. 


constraints,  but  tolerate  no  loss;  and  bulky  media,  which  re¬ 
quire  high  throughput  and  reliable  transmission,  but  can  tol¬ 
erate  some  delay.  We  define  resources  as  application  com¬ 
ponents  in  our  coordination  framework: 

Definition  5  A  resource  R  £  TZ  is  a  tuple 

R  =<  Rid,  Sid,  Rid,  U id,  T^,  T^,  Aji  > 

where  Rid  is  a  unique  resource  identifier  owned  by  user 
Uid  within  session  Sid.  Pid  is  the  parent  identifier  or  the 
resource  that  Rid  belongs  to,  is  the  time  of  creation  or 
injection  of  the  resource  into  the  collaborative  workspace. 
Til  Is  the  deletion  time,  and  An  is  a  list  of  resource  at¬ 
tributes. 

Rid  designates  both  discrete  media  and  streaming  me¬ 
dia  and  may  contain  the  port  where  the  resource  is  trans¬ 
mitted.  The  resource  attributes  An  are  depicted  in  Fig¬ 
ure  3.  The  Pid  value  allows  for  recursive  subsumption  of 
resource  components  within  resources,  and  hence  sharing 
or  resource  components  at  an  arbitrary  granularity.  For  in¬ 
stance,  users  can  share  an  entire  window,  or  a  graphical  ob¬ 
ject  within  that  window. 
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Figure  3.  Resource  attributes. 


Definition  4  A  user  group  (multicast  group)  G  is  a  set  of 
users  U  with  common  session  and  user  attributes,  express¬ 
ing  a  common  media  or  task  focus,  such  that  U  D  G  D  U. 

2.1.4  Resources 

Multimedia  collaborative  systems  use  a  polymorphic  or 
multimodal  mix  of  resources  being  shared  across  networks. 
A  resource  can  be  an  application,  host  object,  or  network 
entity  shared  in  collaboration  at  various  levels  of  granular¬ 
ity.  Four  primary  classes  of  multimedia  traffic  with  dif¬ 
ferent  Quality-of-Service  characteristics  exist  [1]:  control 
packets  for  coordination  information  are  mostly  of  low  vol¬ 
ume,  but  need  reliable  transmission;  real-time  media  trans¬ 
port  time-critical  information  and  tolerate  some  loss;  elastic 
media  are  apt  for  discrete  information  with  relaxed  timing 


Class  describes  whether  the  resource  is  continuous  or 
discrete.  Type  characterizes  the  media  object  class,  indi¬ 
cating  whether  a  resource  is  text-based,  graphical,  or  some 
real-time  medium  and  identifies  the  purpose  it  serves.  Re¬ 
sources  can  be  virtual,  or  they  can  represent  actual  remote 
devices,  for  example,  a  surgical  instrument  in  telemedicine. 
Resources  can  be  mixed  and  need  not  necessarily  be  propri¬ 
etary  to  the  session  from  which  they  are  accessed,  but  could 
also  be  hosted  on  a  machine  “outside”  of  the  session.  Co¬ 
ordination  on  text,  as  the  default  medium  for  most  collabo¬ 
rative  system,  revolves  around  alternate  typing,  for  instance 
in  chat  tools,  or  concurrent  editing  from  chapters  or  sections 
to  single  sentences.  Text  can  be  plain  ASCII,  or  one  or  vari¬ 
ous  rich  text  formats  with  formatting  commands.  Graphics 
tools,  such  as  drawing  and  design  tools,  necessitate  coor¬ 
dination  in  time  and  space,  either  by  marking  areas  on  a 


shared  canvas  or  objects  for  shared  editing,  or  by  introduc¬ 
ing  graphical  widgets  such  as  telepointers.  Functions  the 
compute  or  render  the  shared  workspace  in  a  specific  way, 
are  another  coordination  component.  Still  images  require 
also  spatiotemporal  coordination  and  allow  for  multiple  im¬ 
age  formats,  such  as  TIFF,  GIF,  or  JPEG.  Audio  tools,  for 
speech,  or  music  data  such  as  MIDI  require  temporal  coor¬ 
dination  in  recording  and  replay,  and  spatiotemporal  coor¬ 
dination  in  editing.  For  instance,  a  shared  audio  channel  or 
music  stream  requires  sequenced  access,  whereas  joint  edit¬ 
ing  of  a  music  score  is  a  spatial  aspect.  Silence  detection  is 
useful  for  more  efficient  processing  of  audio  streams,  but 
also  help  to  trigger  speaker  floor  switching.  Video  concerns 
motion  image  display  and  editing,  either  from  a  live  source, 
stored  locally,  or  replayed  on  demand,  and  is  often  used  in 
combination  with  audio,  requiring  temporal  coordination. 
Various  formats,  such  as  H.263  or  MPEG,  should  be  sup¬ 
ported.  Hypertext  information  is  multimodal  and  integrates 
all  of  the  above  resource  types  using  for  instance  HTML  or 
XML,  and  is  either  geared  for  server-push  or  client-pull.  VR 
(Virtual  Reality)  [8,  7,  15]  is  similarly  multimodal,  but  adds 
input  and  output  devices  giving  the  user  three-dimensional 
orientation  or  tactile  sensations.  Coordination  must  be  in¬ 
terfaced  with  collision  control  [17]  in  virtual  spaces.  Code 
comprises  application-specific  structured  documents  such 
as  Postscript,  MIME  email  [4],  or  DT^.  A  device  is  a 
hardware  unit  serving  as  access  point,  such  as  a  camera. 
A  multimedia  conference  is  a  conference  using  multimodal 
resource  types. 

Usage  determines,  if  the  resource  can  be  used  concur¬ 
rently  by  multiple  users  or  requires  sequential  processing 
with  exclusive  floors.  Eor  instance,  a  shared  whiteboard  al¬ 
lows  for  multiple  concurrent  telepointers  with  a  small  num¬ 
ber  of  users,  whereas  a  remotely  controlled  camera  can  only 
perform  a  positioning  command  for  one  user  at  a  time.  Pri¬ 
ority  sets  an  importance  value  on  the  transmission  and  pro¬ 
cessing  of  the  information,  preempting  other  media  dissem¬ 
ination  of  lower  ratings.  QoS  defines  the  required  Quality- 
of-Service  [30]  for  the  resource,  including  the  tolerable 
loss,  the  required  resolution,  the  possible  maximum  delay, 
and  the  color  depth.  Other  criteria  may  be  added  depend¬ 
ing  on  the  nature  of  the  resource,  such  as  the  channel  num¬ 
ber,  a  frame-rate,  encoding  scheme,  sampling  rate  etc.  The 
Protection  attributes  denotes  whether  a  resource  is  public, 
private,  or  proctored,  which  may  be  expressed  with  a  nu¬ 
merical  value,  or  work  in  analogy  with  the  Bell-LaPadula 
model  [2],  discerning  between  top-secret,  secret,  confiden¬ 
tial,  or  unclassified  information  [12].  The  degree  of  security 
determines  the  required  encryption  level  and  method  to  pre¬ 
vent  forgery  of  control  states  and  coordination  messages.  In 
contrast  to  traditional  models  of  protection  giving  access  to 
a  resource  based  on  user  identity,  coordination-based  access 
must  take  into  account  the  task  to  be  performed.  Predom¬ 
inant  measures  to  shield  off  internetworks  with  firewalls 
make  real-time  collaboration  very  difficult  and  are  a  ma¬ 
jor  impediment  in  the  realization  of  Internet  collaboration. 
While  new  concepts  for  secure  collaboration  architectures 
are  emerging  [14],  efficient  key  management  and  encryp¬ 
tion  in  conjunction  with  floor  control  have  yet  to  be  devel¬ 
oped. 

Resources  r  G  R  can  be  located  at  one  particular  node, 
be  distributed  in  their  components  across  the  node  set,  or  be 
replicated  over  all  nodes.  Figure  4  depicts  access  paradigms 
for  shared  resources.  In  case  (a),  one  or  more  resources 
are  centralized  and  accessed  by  multiple  parties;  case  (b) 


Figure  4.  Resource  access  scenarios:  (a)  Cen¬ 
tralization,  (b)  Producer-Consumer,  (c)  Replica¬ 
tion,  (d)  Distribution,  (e)  Multi-resource  access, 
(f)  Multi-resource  consumption. 


lets  one  host  produce  a  resource  and  other  hosts  consume 
it;  case  (c)  shows  the  case,  where  each  party  maintains  a 
replica  of  the  same  resource  locally,  exchanging  updates  on 
a  regular  basis;  in  case  (d)  all  hosts  maintain  partial  infor¬ 
mation  on  the  shared  resource,  using  a  distributed  protocol 
to  aggregate  the  information;  and  cases  (e)  and  (f)  show 
access  or  consumption  of  multiple  resources  by  multiple 
parties.  These  constellations  are  the  baseline  for  configur¬ 
ing  a  coordination  mechanism  to  adapt  to  various  constel¬ 
lations  of  the  shared  workspace.  A  location  mechanism  for 
resources  within  sessions,  and  mapping  scheme  from  re¬ 
source  objects  to  multicast  groups  is  needed,  as  partially 
implemented  with  the  CCCP  protocol  [16]. 

2.1.5  Floors 

A  floor  is  a  temporary  access  and  manipulation  privilege  for 
multimedia  resources  in  interactive  groupwork,  generalized 
to  the  domain  of  CSCW  from  the  “right  to  speak”  [26].  A 
floor  control  protocol  mediates  access  to  shared  objects  by 
granting  floors  according  to  a  group-specific  service  policy. 

Definition  6  A  floor  F  G  F  is  a  tuple 

F  =<  Fid,  Rid,  Uid,  Ti,  T4,  Ap  > 

where  Fid  is  a  unique  floor  identifier  within  the  shared 
workspace  for  a  resource  Rid,  assigned  to  user  Uid  at  in¬ 
ception  time  Ti,  and  deactivated  at  time  Tj,  with  Ap  denot¬ 
ing  a  list  of  floor  attributes. 

Note  that  one  Rid  may  have  multiple  Fid  assigned 
for  control  of  various  granules,  but  each  floor  is  control¬ 
ling  exactly  one  resource.  Floors  are  indirectly  associated 
with  sessions  via  Rid,  and  floor  properties  may  be  inher¬ 
ited  from  a  master  resource  to  its  subcomponents.  We 
assume  that  one  floor  F  is  assigned  per  resource  compo¬ 
nent.  The  pairing  (Fid,  Rid)  specifies  the  granularity  of 
control  and  the  commands  available  with  possession  of  the 
floor.  A  floor  can  control  an  entire  conference,  an  appli¬ 
cation,  a  single  window,  or  a  shared  object  [18].  For  in¬ 
stance,  for  audio  the  associated  commands  may  be  talk, 
mute,  pause.  Video  floor  commands  are  for  instance 
caption,  forward,  cut,  replay.  Floors  can  be 
static  relative  to  a  session  lifetime,  or  dynamic,  i.e.,  as¬ 
signed  ad  hoc  by  a  computer  or  social  protocol.  The  combi¬ 
nation  of  U id  and  the  attributes  specifies  whether  the  user 
is  FC,  FH,  chair,  or  general  participant.  Tj  and  may 
ne  set  using  real-time  clocks,  or  a  logical  session  time.  Fig¬ 
ure  5  depicts  the  floor  attributes. 

With  regard  to  directionality,  we  discern  between 
sender  floors  and  receiver  floors.  A  receiver  floor  refers 
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Figure  5.  Floor  attributes. 


to  the  passive  control  concept  that  a  user  can  filter  or 
deny  specific  received  streams  (“What  I  See  Is  What  I 
Want”).  Floor  control  typically  refers  to  source-based  con¬ 
trol,  which  may  reduce  traffic  significantly  (“What  You  See 
Is  What  I  Share”).  State  defines  the  generic  operational 
states  of  a  floor  control  mechanism.  Free  denotes  an  avail¬ 
able,  unused  floor.  Idle  denotes  an  assigned,  but  inactive 
floor,  Req  marks  a  floor  as  being  requested.  Busy  is  the 
tag  for  a  granted  and  assigned  floor.  Revoked  marks  a  floor, 
whose  lifetime  is  shortened  by  a  moderator  or  a  preemption 
mechanism.  Frozen  marks  a  floor  in  a  pending  session,  and 
Invalid  identifies  a  nonexistent  floor.  A  generic  floor  con¬ 
trol  protocol  defining  the  transitions  between  these  states  is 
depicted  in  Figure  6. 


WAIT 


INACTIVE 

Figure  6.  Generic  floor  control  protocol. 


canvas  may  coexist  in  multiple  renditions.  Disjoint  parties 
may  receive  multiple  instances  of  a  floor,  e.g.,  user  groups 
{Ui,U2)a  and  {U3,U4,U5)a  may  independently  converse 
with  an  audio  floor  F  =  A. 

Passing  describes  whether  floor  management  is  tangible 
or  transparent  to  end-users.  Explicit  control  gives  handles 
to  users  to  start  and  initiate  turns  based  on  the  exchange  of 
markers  that  signify  possession  of  the  floor,  contrasting  im¬ 
plicit  control,  where  no  beacons  are  exchanged  to  transfer 
floors.  Control  may  follow  a  programmed  session  agenda, 
or  allow  for  open  interaction.  Explicit  control  is  manifested 
for  instance  by  pressing  the  Request  button  in  a  shared 
application.  Implicit  control  is  realized  by  users  observ¬ 
ing  inactivity  on  the  resource  and  taking  action  when  ap¬ 
propriate.  Policy  defines  the  request  and  usage  rules.  The 
request  policy  determines,  whether  floor  requests  are  im¬ 
mediately  satisfied,  queued  and  served  according  a  queuing 
policy,  or  discarded,  when  there  is  not  opening  for  the  floor. 
A  chairperson  may  preempt  any  floor  activity.  UseLifetime 
denotes,  whether  a  floor  can  be  used  indefinitely  until  being 
requested,  or  whether  a  timer  or  moderator  control  the  dura¬ 
tion  of  usage.  Modality  distinguishes  between  main  floors 
assigned  for  primary  communication  from  a  sender  to  a  re¬ 
ceiver,  of  backchannel  floors  used  to  give  brief  feedback. 

We  can  distinguish  between  four  paradigms  to  deal  with 
race  conditions  in  cooperative  work;  blocking  of  conflicts 
with  exclusive  locks,  disallowing  of  conflicts  with  permis¬ 
sion  tokens,  mitigating  conflicts  by  detecting  dependencies 
and  reordering  of  activities  into  non-conflicting  series,  and 
resolving  of  inconsistencies  created  through  conflict.  The 
first  two  paradigms  are  restrictive  and  prevent  conflicts,  the 
latter  two  are  permissive  and  allow  for  progress  into  con¬ 
flict  with  preconditions  and  postconditions.  Therefore,  the 
strategy  entails  pessimistic  control  following  the  premise  of 
conflict  avoidance,  versus  optimistic  control  as  the  strategy 
to  allow  conflicts  and  provide  means  such  as  dependency 
detection[29]  to  resolve  them. 

Previously  [9]  we  proposed  the  idea  to  integrate  group 
coordination  services  with  the  IP-multicast  infrastructure 
and  framework,  which  is  currently  gradually  deployed  on 
the  Internet,  so  that  coordination  services  should  be  de¬ 
ployed  on  top  of  reliable  multicast  and  ideally  operate  on 
the  same  logical  network  topology.  This  approach  elimi¬ 
nates  the  need  for  a  separate  control  infrastructure  for  track¬ 
ing,  routing,  withholding,  or  forwarding  coordination  direc¬ 
tives  and  enables  distributed  activities  in  large  groups  and 
at  large  distances  with  low  latencies. 

The  presented  model  serves  both  theoretical  and  prac¬ 
tical  purposes.  It  provides  a  more  elaborate  framework 
for  formal  specification  and  validation  of  collaborative  sys¬ 
tems,  e.g.,  with  the  prototype  verification  system  [24].  It 
also  allows  for  session  capability  descriptions  [22]  for  on- 
the-fly  specification,  set  up  and  query  of  the  membership 
and  coordination  status  of  an  active  conference.  A  capabil¬ 
ity  is  understood  as  a  resources  or  system  feature  influenc¬ 
ing  the  selection  of  useful  configurations  for  components. 


Floor  control  can  be  relaxed  for  concurrent  activities 
where  the  chance  of  direct  conflict  is  smaller,  e.g.,  in  joint 
editing  of  text  paragraphs,  but  it  must  be  strict  in  oppos¬ 
ing  activities  such  as  speaking  over  the  same  audio  chan¬ 
nel.  Instantiation  defines  accordingly,  how  many  instances 
of  the  same  floor  may  exist  concurrently  in  the  system.  A 
remote  instrument  with  exactly  one  function  to  be  shared 
permits  a  single  floor,  whereas  telepointers  on  a  whiteboard 


3  SUMMARY 

Our  objective  is  to  explore  the  key  elements  for  a  new 
breed  of  coordination  protocols  and  architectures  useful  for 
engineering  computer-supported  cooperative  work  applica¬ 
tions  operating  at  Internet  scope.  We  view  coordination  as 
the  third  integral  component  in  group-oriented  communica- 


tion  services  in  the  Internet,  complementing  group  dissem¬ 
ination  and  membership  protocols  and  enriching  the  cur¬ 
rent  IP-multicast  service  model,  which  lacks  refined  sup¬ 
port  for  group  coordination.  The  goal  of  our  framework  is 
to  characterize  the  relevant  parameters  in  designing  an  API 
for  rapid  development  of  group  coordination-supportive  ap¬ 
plications.  To  this  end,  we  elaborated  on  a  novel  taxonomy 
of  typical  coordination  components  in  collaborative  multi- 
media  applications. 
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